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A Method for Mapping the Active Sites Bound by Enzymes that Covalently Modify 

Substrate Molecules 

Field of Invention 

This invention is immediately relevant to many medical fields including inflammation, 
autoimmunity, transplantation, cancer, anti-microbials, virology, metabolic disease and 
allergy. Methods to identify selective substrates of specific enzymes are indicated, based on 
the detailed mapping of the substrate binding site using combinatorial peptide libraries. 
These enzymes can be any molecule that covalently modifies its physiological substrate 
target, examples of which include, but are not limited to, protein kinases, protein 
phosphatases, acetylases and ribosylases. These derived substrate-based compounds can 
serve as a basis for further medicinal chemistry development of selective enzyme 
inhibitors. The identification of short peptidic substrates using this methodology will also 
allow for the rapid development of high throughput screens for compound screening 

Background to Invention 

The mapping method is exemplified using members of the protein kinase enzyme family, 
but this method is applicable to other covalently modifying enzymes. 

Phosphate transfer (phosphorylation) is the most common form of covalent protein 
modification used by cells. Protein kinases are the enzymes that catalyse the transfer of the 
y-phosphate from adenosine triphosphate (ATP) to an amino acid residue (usually tyrosine, 
threonine, serine or histidine) on a substrate molecule. Approximately 400 kinases are 
currently known, and it is likely that this number will increase considerably in the next few 
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years as more information from gene sequencing databases becomes available. 
Functionally, these molecules are intracellular enzymes that play key roles in cell growth, 
differentiation and inter-cell communication. Aberrant protein kinase activity has been 
implicated in many disease states including several forms of cancer and severe-combined 
immunodeficiency disease (Barker et al 1997; Lehtola et al\ Arpaia et al 1994; Elder et al 
1995; Roifman, 1995). Similarly, activation of protein kinase activity within mononuclear 
cells is required to drive the cytokine production which underlies many autoimmune 
diseases (Lee et al, 1994). Thus, inhibitor compounds capable of specifically inactivating 
certain critical kinases may have considerable therapeutic benefit in a number of clinical 
diseases. 

All tyrosine and serine/threonine protein kinases have a region of approximately 300 amino 
acids known as the catalytic subunit which has evolved from a common ancestor kinase 
(Hanks and Quinn, 1991). Crystal structure determination of several kinases has shown that 
they all have a common bi-lobal structure (Wilson et al 1996; Zhang et al 1994; Xu et al 
1997). The ammo-terminal part of the subunit encodes a small lobe responsible for the 
binding of ATP, whereas the carboxy-teiminal residues encode a larger lobe important for 
protein substrate binding. In the tertiary structure of the active kinase, both the ATP and 
the protein substrate binding sites are brought together allowing transfer of the ATP y- 
phosphate to the amino acid acceptor on the protein substrate. The protein/peptide binding 
groove stretches across the face of the large lobe between two a-helices and under the 
small lobe. This groove therefore contains the residues important for defining the substrate 
specificity of the kinase. 

Many protein kinases are arranged in kinase cascades within the cell, providing the ability 
for signal amplification in post-transduction pathways. This amplification relies on the 
upstream kinase specifically activating its downstream partner. For this reason, protein 
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kinases have developed remarkable substrate specificities which prevent unwanted 
crosstalk between different kinase cascades. This substrate specificity can be exploited in 
the design of selective protein kinase inhibitors. 

A technique has recently been developed by L. Cantley's laboratory to provide, consensus 
peptide protein kinase substrate information (WO 95/18823, Songyang et a/, 1994). First, a 
degenerate library of peptides with a phospho-acceptor such as tyrosine or serine/threonine 
flanked by amino acids on each side is synthesised. A preferred number of degenerate 
residues on each side of the phosphorylation site is four (corresponding to positions -1, -2, - 
3, -4, +1, +2, +3, +4) relative to the phosphorylated residue. Thus the library consists of 
peptides having a length of nine amino acids. The library is then phosphorylated by the 
protein kinase of interest and phosphorylated peptides isolated from the non- 
phosphorylated peptides by DEAE-sephacel and ferric chelation chromatography. The 
phosphopeptide mixture is then sequenced and the frequency of each amino acid at every 
position assessed to give a prefenred substrate sequence. These studies have yielded 
consensus substrate information, but do not allow a detailed analysis of particular 
preferences for neighbouring residue interactions as pools of peptides are examined. 
Furthermore, this type of analysis may not show up rare good substrates which could be 
hidden by the presence of numerous poor substrates in the peptide pool. By this method 
individual peptides can never be identified as individual sequences, the result is that an 
average picture of substrate specificity is reached. Part of the problem is that each 
individual peptide is represented at such a low level, and many inevitably will not even be 
present. The results from Cantley's method do not represent individual peptides but a 
consensus picture of protein kinase substrate specificity. 

Filamentous phage expressing gene Ill-linked degenerate peptide sequences have also been 
used to generate substrate information (Schmitz et al, 1996; Dente et a/, 1997), however 
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this method is labour intensive and does not allow the use of unnatural amino acids or 
peptidomimetics. Substrate information can also be obtained from knowledge of the 
physiological kinase substrates. This approach is limited and previous attempts to utilise 
this information for the design of successful therapeutic cell permeable protein kinase 
inhibitors has failed (Kemp et aL 1991). 

We believe that identification of detailed substrate characteristics can intimately map the 
substrate binding groove and provide information that can lead to the design of enzyme 
inhibitor molecules. For the reasons described above, there are no current methods for 
obtaining this information. Therefore, we have invented a method of using small molecules, 
in a self deconvoluting library format, to probe a larger active site by positional scanning of 
a target group. This method is rapid, not labour intensive and results in the identification of 
discrete sequences. 

Summary of Invention 

This invention provides for the active site mapping of enzymes which catalyse covalent 
modification including, but not limited to phosphorylation, acylation, dephosphorylation 
in which a fixed residue (hereafter known as the catalytic residue) such as a tyrosine, 
serine, threonine, histidine, aspartic acid residue or any other residue containing an 
appropriate side chain is modified. The method of the invention has an additional level of 
complexity over and above that of the self-deconvoluting libraries described in 
W097/42216 and Example 5 (the content of which is incorporated herein by reference, 
where legally permissible). 

This involves making a library of smaller libraries (referred to as sub-sets) where a fixed 
residue is moved stepwise through the sequence of amino acids or other groups (such as 
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peptidomimetics [any compound that can be added to the substrate or inhibitor chain]). The 
result is that in each sub-set of the library the fixed residue is found in a different position. 
For example, in a library using four variable positions, five sub-sets in each library have to 
be made, ZXXXX, XZXXX, XXZXX, XXXZX and XXXXZ where Z is the fixed residue 
and X are the four variable residues. We recognise that there may be a need in certain 
circumstances for further invariant residues, however these would occupy fixed positions 
and would not affect either the scanning or the seif-deconvo luting of the libraries. The 
invariant residue(s) might be fixed in position relative to the modifiable residue Z or may 
be fixed in position relative to the overall motif sequence. Additional fixed residues can be 
added if desired, or one of the variable residues can be made invariant In the later case the 
library would be a small part of the libraries described here. Cases where it is desirable to 
include one or more fixed residues include libraries required to look at enzymes which 
always require another invariant residue in another position. However, in cases where two 
fixed residues are required, and they are both modified, it can be desirable to include this 
residue in one of the variable positions (i.e. make it one of the residues chosen in a variable 
position). The reason for this is that the sequence of events (the order in which the two 
residues become modified) can then also be probed by this scanning library technique. In 
this case it may also be beneficial to make an additional library in which the fixed residue is 
not present at all, corresponding to the library XXXX. We would therefore have a library of 
six sub-sets. These modifications are within the scope of the invention and would be 
recognised by someone skilled in the art. 

It can be readily seen that by combining the data from each library sub-set, the residues 
from -4 to +4 either side of the catalytic residue can be mapped: 

A-B-C-D-Z 
B-C-D-Z-E 
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C-D-Z-E-F 
D-Z-E-F-G 
Z-E-F-G-H 



The mapped sequence would therefore be A-B-C-D-Z-E-F-G-H. 

The above example using 5 subsets of libraries of peptides of 5 amino acids allows the 
mapping of a sequence of 9 amino acids. In general one could carry out the invention using 
n subsets of n-mer peptides so as to provide mapping data for the residues from -(n-1) to 
-r(n-l) either side of the active site. Thus in general the length of the mapped sequence 
would be (2n)-l. 

Where the residue type at any given position relative to the fixed residue is similar in 
different subsets, the data can be used in an additive manner. For example, if an aromatic 
residue is required adjacent to the fixed residue, then any sequences which contain this 
feature in any of the library subsets can be considered in an additive way. 

In this invention there is no need to separate modified from unmodified sequences because 
of the self deconvoluting nature of the library. The assay screen produces a series of hits, 
the patterns of which reveal the unique sequences in each well. This enables a pattern of 
substrate preferences to be determined for any enzyme. 

The unique sequences obtained using this invention can be used to provide substrates for 
high throughput assays and provide detailed information about the active site to aid rational 
drug design. 
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This invention can also be used as an inhibitor library to screen against known modifying 
enzymes where a known substrate exists and can be set up in an assay format. Those skilled 
in the art would realise that by replacing the fixed residue with a suitable compound that is 
not modified an inhibitor library can be constructed. For example if a modifiable fixed 
tyrosine were to be changed to a tyrosine derivative residue that cannot be phosphorylated, 
such as halogenated tyrosine, dopamine, or tyrosine substituted by aromatic compounds, 
then an inhibitor library will be formed. This could allow the more direct identification of 
prototype inhibitors of enzymes for rational drug design. 

Use of these libraries could be extended to other systems where a defined endpoint is 
desired, but the target enzyme is unknown. Such examples could include, but are not 
limited to, bacterial lysis in growing cultures or inhibiting phosphorylation of transcription 
factors in cell lysates. 

In one embodiment of this invention the sequences identified by this method are 
considerably smaller than have previously been reported for library screens on protein 
kinase substrates, which makes them more amenable to computer modelling and drug 
design. Furthermore, this methodology provides information about the relative relationships 
between neighbouring residues of active substrates; information which is not available from 
a straightforward oriented degenerate peptide library approach used by Cantley (Songyang 
et al, 1994). Thus, this novel methodology provides a significant improvement in the 
quality of substrate based information that is achievable, in comparison to that produced 
from previously described methods. 

This invention allows data to be obtained from single peptide rankings which could be used 
to rationally design sets of enzyme inhibitor molecules which compete with the 
physiological substrate for binding to the active site of the enzyme. 
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Description of the Drawings 

Figure 1 . Design of protein tyrosine kinase library. Each peptide consists of a biotin tag, an 
epsilon amino hexanoic acid spacer and 5 amino acids, including a phosphorylatable 
tyrosine residue. Each of the amino acid positions A-H is varied as described. For example, 
Al-10 means that position A is varied using 10 defined amino acids. 

Figure 2. Best substrates identified by screening tyrosine library sub-sets 1 to 5 against 
ZAP.-70 protein tyrosine kinase. The protein tyrosine kinase library described in Figure 1 
was phosphorylated for 30 minutes at 30°C using the catalytic domain of human ZAP-70. 
Peptides were captured using strepavidin-coated 96 well plates and phosphotyrosine 
detected using anti-phosphotyrosine antibody, anti mouse IgG-HRP and 
tetramethylbenzidine (see experimental methods). Best substrates were identified as those 
which gave the highest amount of phosphate incorporation. 

Figure 3. Km Determination of Biotin-eAHA«DEEDYFE(Nle) [SEQ ID NO. 3]. The 
catalytic domain of human ZAP-70 was used to phosphorylate varying concentrations of 
peptide for 10 minutes at 30°C in the presence of 33 P-y-ATP. Peptide capture was 
performed using strepavidin filter plates, scintillation fluid added, and counting performed 
using a beta-counter (see experimental methods). Samples were assayed in triplicate. 

Figures 4 to 17. Component distributions in the plates of a library matrix. 
Description of the Invention 

This invention provides for the active site mapping of enzymes which catalyse covalent 
modification including, but not limited to, phosphorylation, acylation, dephosphorylation 
in which a residue such as a tyrosine, serine, threonine, histidine, aspartic acid or any 
other residue containing an appropriate side chain is modified. 
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Thus, the invention provides a method for determining an amino acid sequence motif or a 
peptidomimetic sequence motif containing an active site capable of being bound by an 
enzyme which catalyses covalent modification of a substrate molecule, comprising; 

a) contacting the enzyme with a library consisting of a number of oriented 
degenerate library subsets of molecules, each subset comprising 
unmodified degenerate motif sequences each having n residues and each 
having a modifiable residue at a different fixed non-degenerate position, 
under conditions which allow for modification of molecules which are a 
substrate for the enzyme; 

b) allowing the enzyme to modify modifiable residues in library subsets 
containing molecules having an active substrate site for the enzyme; 

c) deconvoiuting the oriented degenerate library subsets of the library, in situ 
without separating modified from unmodified molecules, so as to reveal 
the sequence of any motif which has been modified by covalent binding of 
the enzyme; 

wherein each library subset is of formula (I) 

(Xaa) x Zaa(Xaa) y (I) [SEQ ED No. 1] 

wherein 

Zaa is a non-degenerate modifiable natural or unnatural amino acid residue or 
peptidomimetic; 

Xaa is any natural or unnatural amino acid residue or peptidomimetic; 
x and y are each independently 0 or an integer; 
(x + y) = (n-l); and 

n = an integer from 3 to 8, preferably 5. 
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This invention can be applied for instance to a protein tyrosine kinase in order to exemplify 
the technology. It provides a rapid method of identifying discrete protein kinase substrate 
sequences which allows pharmacophore generation and design of active site inhibitors. This 
invention can also be used to directly identify protein kinase inhibitor molecules. General 
formula: (Xaa) x Tyr (Xaa) y [SEQ ID No. 2], 

In the first exemplification of this invention, a recombinant form of the human ZAP -70 
enzyme was used in an in vitro phosphorylation reaction to phosphorylate the five substrate 
sub-libraries which scan the sequence -4 to +4 around a central tyrosine residue (Figure 1). 
The libraries were arranged in 96 well microtitre plate format with pools of 20 peptides in 
each microtitre well. However, those skilled in the art will realise that the library can be 
constructed on any scale. For example the library Sub-Set can be miniaturised on a "chip" 
scale or constructed on a large bulk scale depending on the requirements of the library. 

Library peptides were made with biotin tags, which allowed peptide capture on 
strepavidin-coated microtitre plates. Detection of phosphotyrosine was achieved using 
anti-phosphotyrosine antibody detection in an ELISA assay using tetramethylbenzidine 
substrate and recording absorbance at 450 nm. Background absorbance readings of 0.1 to 
0.2 were recorded while the highest substrate peptide value was 1.5. Deconvolution of the 
hit peptides was performed as described in WO 97/42216 and Example 5. Clear defined 
substrates were deconvoluted in library sub-sets 1 to 4, but not in 5. This probably 

reflects the absolute requirement of ZAP-70 for an amino acid residue in the -1 position. 

( 

For the purpose of this exemplification, the peptides used were tagged with d-Biotin and a 
linker (epsilon amino hexanoic acid or some other spacing group). In principle any tag and 
linker can be used, although this invention also provides that a tag and linker does not have 
be present if mass spectroscopy, for example, is used to identify the peptide hits. The 
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purpose of the tag is solely to enable capture of all of the peptides (whether modified or 
not) so that excess reagents can be washed away. The reporting systems to detect peptide 
modification can include, but are not limited to, antibody recognition, radioactive assay or 
mass spectroscopy, 

In the library used to exemplify the invention, a Biotin tag was chosen because we believed 
that this would give improved results. The reasoning for this choice of tag was because of 
the high level of positively charged groups on the enzyme in the area in which the tag sits. 
This charged area would cause unfavourable interactions with tags more commonly use by 
others in the field, such a poly-lysine or poly-arginine. We would expect this reasoning to 
be applicable to any enzyme which binds a highly negatively charged molecule such as but 
not limited to ATP, close to the peptide binding site. 

Tags are preferably non peptidic, with as little charge, either positive or negative, as. 
possible. Biotin is a good example of this. The aim is to minimise the interactions of the 
tag with the protein so the resultant hits are largely due to the binding of the peptides rather 
than reflecting the binding of the tag. The best method of all if this argument is applied to 
its logical conclusion would be to not use a tag at all and use mass spectroscopy to identify 
the peptides. However, currently this approach is of limited value due to the time taken to 
run and analyse a library of the size used here to exemplify the invention. 

The results obtained from the library screen clearly demonstrated amino acid residues 
preferred by the protein kinase at each of the -4 to +4 sites (Figure 2). The 5 mer peptides 
overlapped to give information on amino acid preference at each of the binding positions -4 
to +4. To confirm this a consensus peptide, B iotin-s AHA-DEED YFE (Nle) [SEQ ID No. 
3], representing the best -4 to +4 amino acids was made and tested as a substrate (Figure 3). 
This substrate gave a Km against ZAP-70 of 15.79 joM, which is better than the best ZAP- 
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70 substrate described in the literature, a longer peptide of 14 amino acids with a tag of 3 
arginines and a K m of 29 jiM (Wandenburg et al, 1996). 

In the second application of this invention, a recombinant form of the human Syk enzyme 
was used in an in vitro phosphorylation reaction to phosphorylate the five substrate sub- 
libraries which scan the sequence -4 to +4 around a central tyrosine residue, as previously 
performed for the ZAP-70 library. Detection of phosphotyrosine was achieved using anti- 
phosphotyrosine antibody detection in an ELISA assay using tetramethylbenzidine 
substrate and recording absorbance at 450 nm. Background absorbance readings of 0.10 
were recorded while the highest substrate peptide value was 1.46. Decon volution of the 
hit peptides was performed as described in WO 97/42216 and Example 5. Clear defined 
substrates were deconvolved in all library sub-sets. 

In the third application of this invention, a recombinant form of the human CSK enzyme 
was used in an in vitro phosphorylation reaction to phosphorylate the five substrate sub- 
libraries which scan the sequence -4 to +4 around a central tyrosine residue, as previously 
performed for the ZAP-70 library. Detection of phosphotyrosine was achieved using anti- 
phosphotyrosine antibody detection in an ELISA assay using tetramethylbenzidine 
substrate and recording absorbance at 450 nm. Background absorbance readings of 0.04 
were recorded while the highest substrate peptide value was 0.22. Deconvolution of the 
hit peptides was performed as described in WO 97/42216 and Example 5. Clear defined 
substrates were deconvoluted in all library sub-sets. 

In the fourth application of this invention, a recombinant form of the Abelson murine 
leukaemia virus protein tyrosine kinase v-Abl was used in an in vitro phosphorylation 
reaction to phosphorylate the library sub-set 4 which scans the sequence -1 to +3 around a 
zero position tyrosine residue, as previously performed for the ZAP-70 library. Detection of 
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phosphotyrosine was achieved using anti-phosphotyrosine antibody detection in an 
ELISA assay using tetramethylbenzidine substrate and recording absorbance at 450 nm. 
Background absorbance readings of 0.11 were recorded while the highest substrate 
peptide value was 0.32. Deconvolution of the hit peptides was performed as described in 
WO 97/42216 and Example 5. Clear defined substrates were deconvoluted in the library 
sub-set. 

In the fifth application of this invention, the invention was used to map the substrate 
specificity of a protein serine or serine/threonine kinase (which include I-kappa B kinase 
beta and cAMP-dependent protein kinase [cAPK]). A protein serine or serine/threonine 
kinase enzyme was used in an in vitro phosphorylation reaction to phosphorylate the five 
substrate sub-libraries which scan the sequence -4 to +4 around a central serine residue. 
The library was synthesised as the protein tyrosine kinase ZAP-70 library save that the 
tyrosine fixed residues were replaced with a serine which was then scanned through the 
five sub-libraries. Detection of phosphoserine was achieved using anti-phosphoserine 
antibody detection in an ELISA assay using tetramethylbenzidine substrate and recording 
absorbance at 450 nm. Deconvolution of the hit peptides was performed as described in 
WO 97/42216 and Example 5. 

General formula: (Xaa) x Ser (Xaa) y [SEQ ID No. 4]. 

Library Sub-Set 1 Xaa-Xaa-Xaa-Xaa-Ser 

Library Sub-Set 2 Xaa-Xaa-Xaa-Ser-Xaa 

Library Sub-Set 3 Xaa-Xaa-Ser-Xaa~Xaa 

Library Sub-Set 4 Xaa-Ser-Xaa-Xaa-Xaa 

Libraiy Sub-Set 5 Ser-Xaa-Xaa-Xaa-Xaa 

Likewise the Library Sub-Sets can be synthesised for the mapping of threonine kinases by 
the synthesis of a library containing the threonine residue to allow phosphorylation by 
enzymes recognising this residue. 
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It will be realised by those skilled in the art that the replacement of the recognition residue 
such as the tyrosine, or serine, with a residue that is covalently modified by the enzyme to 
be mapped allows the active site of any such enzyme to be determined according to the 
invention. 

The invention will now be described by reference to the following examples. 
Example 1 

In this example the invention was used to map the active catalytic site of ZAP-70, a protein 
kinase enzyme that catalyses the phosphorylation of a tyrosine residue. The example 
illustrates the synthesis of a number of compounds, and their use as a sub-set library for the 
mapping of the enzyme so as to allow the subsequent identification and synthesis of single 
specific substrates for the enzyme. 

Synthesis of Peptide Compounds. 

Preparation of Crown Assembly 

The peptide compounds were synthesised in parallel fashion using Fmoc-Rink-DA/MDA 
derivatised gears (ex Chiron Mimotopes, Australia) loaded at approximately 1.6 per 
crown. Prior to synthesis each crown was connected to its respective stem and slotted into 
the 8 x 12 stem holder. Coupling of the amino acids employed standard Fmoc amino acid 
chemistry as described in 'Solid Phase Peptide Synthesis', E. Atherton and R.C. Sheppard, 
IRL Press Ltd, Oxford, UK, 1989. 
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Removal of N -Fmoc Protection 

A 250 ml solvent resistant bath is charged with 200 ml of a 20% piperidine/DMF solution. 
The multipin assembly is added and deprotection allowed to proceed for 30 minutes. The 
assembly was removed and excess solvent removed by brief shaking. The assembly is then 
washed consecutively with (200 ml each), DMF (5 minutes) and MeOH (5 minutes, 5 
minutes, 5 minutes) and left to air dry for 15 minutes. 

Quantitative UV Measurement of Fmoc Chromophore Release 

A 1 cm path length UV cell is charged with 1.2 ml of a 20% piperidine/DMF solution and 
used to zero the absorbance of the UV spectrometer at a wavelength of 290nm. A UV 
standard is then prepared consisting of 5.0 mg Fmoc-Asp(OBut)-Pepsyn KA (0.08 mmol/g) 
in 3.2 ml of a 20% piperidine/DMF solution. This standard gives Abs 2 oo = 0.55-0.65 (at 
room temperature). An aliquot of the multipin deprotection solution is then diluted as 
appropriate to give a theoretical Abs 2 9o = 0.6, and this value compared with the actual 
experimentally measured absorbance showing the efficiency of previous coupling reaction. 

Coupling Of Standard Amino Acid Residues 

Coupling reactions were performed by charging the appropriate wells of a polypropylene 
96 well plate with the pattern of activated solutions required during a particular round of 
coupling. Gear (approx 1.6 fimole) standard couplings were performed in DMF (300 nl). 

Coupling of an Amino-acid Residue To Appropriate Well 
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Whilst the multipin assembly is drying, the appropriate N-Fmoc amino acid pfp esters (10 
equivalents calculated from the loading of each crown) and HOBt (10 equivalents) required 
for the particular round of coupling are accurately weighed into suitable containers. 
Alternatively, the appropriate N-Fmoc amino acids (10 equivalents calculated from the 
loading of each crown), desired coupling agent e.g. HBTU (9.9 equivalents calculated from 
the loading of each crown) and activation e.g. HOBt (9.9 equivalents calculated from the 
loading of each crown), NMM (19.9 equivalents calculated from the loading of each 
crown) were accurately weighed into suitable containers. 

The protected and activated Fmoc amino acid derivatives were then dissolved in DMF (300 
1 for each gear e.g. for 20 gears, 20 x 10 eq. x 1.6 jimoles of derivative would be dissolved 
in 10 ml DMF). The appropriate derivatives were then dispensed to the appropriate wells 
ready for commencement of the 'coupling cycle'. As a standard, coupling reactions were 
allowed to proceed for 6 hours. The coupled assembly was then washed as detailed below. 

Coupling of d-Biotin acid to pins 

d-Biotm (lOeq), l-hydroxybenzotriazole.H 2 0 (lOeq), BOP (9.95eq) and NMM (19.9eq) 
were dissolved in DMF (0.3mL per well) and agitated for 2 minutes. 300 \xL of solution 
was dispensed to each well of a 96-well polypropylene plate. The gears were then added to 
the solution and left for 24 hours. Fresh solution was made up, the gears washed in DMF 
for 5 minutes and then added to the fresh coupling mixture and left a further 24 hours. 

The pin assembly was removed from the plate, shaken free of excess liquid then immersed 
in DMF (200mL) for 5mins. The assembly was again shaken then immersed in MeOH 
(200mL, 3 x 5mins) and allowed to air dry. 
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Washing Following Coupling 

If a 20% piperidine/DMF deprotection is to immediately follow the coupling cycle, then the 
multipin assembly is briefly shaken to remove excess, sol vent washed consecutively with 
(200 ml each), MeOH (5 minutes) and DMF (5 minutes) and de-protected (see 6.2). If the 
multipin assembly is to be stored or reacted further, then a full washing cycle consisting 
brief shaking then consecutive washes with (200 ml each), DMF (5 minutes) and MeOH (5 
minutes, 5 minutes, 5 minutes) is performed. 

Following these general methods, the peptide libraries shown in Figure 1 were sequentially 
assembled by applying the appropriate coupling procedure at the correct cycle during 
synthesis. 

Acidolytic Mediated Cleavage of Peptide-Pin Assembly 

Acid mediated cleavage protocols were strictly performed in a fume hood. A polystyrene 
96 well plate (1 ml/well) was labelled, then the tare weight measured to the nearest mg. 
Appropriate wells were charged with a trifluoroacetic acid/triisopropylsilane (95:5, v/v, 600 
|il) cleavage solution, in a pattern corresponding to that of the multipin assembly to be 
cleaved. 

The multipin assembly is added, the entire construct covered in tin foil and left for 2 hours. 
The multipin assembly in then added to another polystyrene 96 well plate (1 ml/well) 
containing trifluoroacetic acid/triethylsilane (95:5, v/v, 600 (as above) for 5 minutes. 

Work up of Cleaved Peptides 
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The primary polystyrene cleavage plate (2 hour cleavage) and the secondary polystyrene 
plate (5 minute wash) were then placed in the SpeedVac and the solvents removed 
(minimum drying rate) for 90 minutes. 

The contents of the secondary polystyrene plate were transferred to their corresponding 
wells on the primary plate using an acetonitrile/water/acetic acid (50:45:5, v/v/v) solution 
(3 x 150 fil) and the spent secondary plate discarded. 

Analysis of Products 

A 5fiL aliquot from each well is diluted to 100 \i\ with 0.1% aq. TFA, then a IOjiL aliquot 
from this plate diluted with a further 100 \i\ 0.1% aq. TFA. The double diluted plate was 
analysed by HPLC-MS. 

Final Lyophilisation of Peptides 

The plate was covered with tin foil, held to the plate with an elastic band. A pin prick was 
placed in the foil directly above each well and the plate placed at -S0°C for 30 minutes. The 
plate was then lyophilised on the 'Heto freeze drier' overnight. Finally, the dried plate was 
weighed. The total cleaved peptide was quantified (by weight) and the average content of 
each peptide calculated. Since all the peptides present have originated from the same 
peptide-pin assembly, cleaved under identical conditions, it is reasonable to assume that the 
contents of each well are roughly equimolar. 

Protein kinase cloning, expression and purification 
Polymerase chain reaction (PCR) and downstream cloning 
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The coding sequence for human ZAP-70 amino acid 306-615 was amplified from Jurkat T 
cell cDNA by PCR (2 minutes at 94°C, followed by 35 cycles of 15 seconds 94°C, 30 
seconds 65°C, 2 minutes 72°C and a final single 5 minute 72°C incubation) using the 
primers: 

5' CCGGGATCCGCCATGCCCATGGACACGAGCGTGTAT 3' [SEQ ID No. 5] 

5' GGGGGATCCTCAGTGGTGGTGGTGGTGGTGGGCACAGGCAGCCTCAGC 
CTTCTGTGT 3' [SEQ ID No. 6] 

The PCR amplicon was cloned into the Bam HI site of pUC19 and sequence confirmed 
using M13-20 and reverse primers on an Applied Biosystems Prism 310 sequencer as 
described by manufacturer. The Bam HI ZAP-70 insert was excised from the sequencing 
vector and ligated into the Baculoviral transfer vector pAcUW5 1 (Pharmingen). 

Generation of ZAP-70 enzyme using baculovirus 

Homologous recombination with wild type baculoviral DNA was then performed in Sf9 
insect cells and viral supernatant harvested. Plaque purified virus was exposed to several 
viral amplification steps then used at a titre of 3x1 0 9 PFU/ml to infect 3 1 lxlO 6 cells/ml Sf9 
cells at an MOI of 10 in an Applecon bioreactor using 60% dissolved oxygen. Cells were 
harvested 3 days post infection. 

Protein purification 

The infected cell pellet was lysed in 50 mM Tris-HCl, pH 7.5, 0.15 M NaCl, 25% sucrose, 
1 mM 4-nitrophenol phosphate, 1 mM sodium orthovanadate and protease inhibitors. 
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Following homogenisation using a dounce pestle B, the cleared lysate was loaded onto a 
cobalt-sepharose column. After column washing with lysis buffer, elution was performed 
with an imidazole gradient and ZAP-70 fractions identified by protein kinase activity 
against the peptide substrate Lys-Lys-Lys-Lys-Ala-Asp-Glu-Glu-Asp-Tyr-Phe-Ile-Pro-Pro- 
Ala as described in Casnellie et al f 1991 

Library Screening 

Library peptides were phosphorylated in pools of 20 peptides at a final concentration of 1 
total peptide in 50 mM HEPES, pH 7.5, 0.1% Triton X-100, 100 |iM ATP, 10 mM 
MnCl 2 , 1 mM DTT and 0.2 mM sodium orthovanadate for 30 minutes at 30°C These 
reaction mixtures were then stopped using 100 mM EDTA, 6 mM adenosine, transferred to 
strepavidin-coated microtitre plates and allowed to bind for 30 minutes at 20°C. 
Unincorporated reaction products were washed from the plate using PBS/0.1% Tween 20 
then plate incubation performed with anti-phosphotyrosine antibody (Sigma mouse 
monoclonal clone PT66) in 2% BSA/PBS/Tween for 1 hour at 20°C. Unbound antibody 
was removed by plate washing using PBS/Tween then incubation performed with rabbit 
anti mouse-HRP (Amersham) for a further 1 hour. HRP detection was then performed with 
tetramethylbenzidine (TMB) substrate and measuring absorbance at 450 nM using a 
spectrophotometer (Dynex MRX). 

The best substrates were identified as those which gave the highest amount of phosphate 
incorporation. The library subsets were deconvoluted according to the teaching of 
W097/42216: this gives an immediate determination of the unique sequence of any 
phosphorylated motif without the need for further synthesis or sequencing. (Figure 2 [SEQ 
EDNos. 7,8,9,10]). 
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Peptide K m determination 

Bio tin-tagged peptides were phosphorylated at varying concentrations in 50 mM HEPES, 
pH 7.5, 0.1% Triton X-100, 200 jiM ATP, 10 mM MnCl 2 , 1 mM DTT and 0.2 mM sodium 
orthovanadate using 0.5 (iCi 33 P-y-ATP for 10 minutes at 30°C. The reactions were stopped 
using 2 M guanidine hydrochloride, diluted 1 in 10 in water then 5 \i\ reaction mix spotted 
onto SAM™ titre plates (Promega). Unincorporated reaction products were washed as 
described by manufacturer then 20 scintillation liquid added and plate counted on a 
Packard TopCount beta-counter. K m was calculated using a non-linear one site hyperbola 
model (ATP was added in excess to negate influence of the ATP binding site on the 
substrate site kinetics). (Figure 3). 

Example 2 

In this example the invention was used to map the active catalytic site of Syk, a protein 
kinase enzyme that catalyses the phosphorylation of a tyrosine residue. The example 
illustrates the positional scanning of the sub-set libraries for the mapping of the enzyme, 
and their use so as to allow the subsequent identification of the preferred substrates of the 
enzyme catalytic site. 

The mapping and assessment of the catalytic site was performed as detailed in Example 1 . 
The substrate preferences were deconvoluted as detailed in WO 97/42216 and are detailed 
below. 



Library Sub-Set 1 
Library Sub-Set 2 



Asp-Glu-Glu-Asp-Tyr [SEQIDNo. 11] 

Asp-Glu-Glu-Tyr-Asp [SEQ ID No. 12] 



SUBSTITUTE SPECIFICATION 



22 

Library Sub-Set 3 Asp-Glu-Tyr-Glu-Asp [SEQ ID No. 13] 

Library Sub-Set 4 Asp-Tyr-Glu-Glu-Val [SEQ ID No. 14] 

Library Sub-Set 5 Tyr-Ser-Ile-IIe-Nle [SEQ ID No. 15] 

Example 3 

In this example the invention was used to map the active catalytic site of CSK. The subset 
library was used to scan the enzyme active site so as to allow the subsequent identification 
and synthesis of the preferred specific substrates for the enzyme as listed below. 



Library Sub-Set 1 Asp-Glu-Glu-Glu-Tyr [SEQ ID No. 1 6] 
Library Sub-Set 2 Asp-Glu-Glu-Tyr-Phe [SEQ ID No. 17] 

Library Sub-Set 3 Asp-Glu-Tyr-His-Asn [SEQ ID No. 1 8] 

Library Sub-Set 4 Asp-Tyr-His-Leu-Phe [SEQ ID No. 19] 

Library Sub-Set 5 Tyr-Pro-Ile-Glu-Val [SEQ ID No. 20] 



Example 4 

In this example the invention was used to map the active catalytic site of v-Abl, a protein 
kinase enzyme that catalyses the phosphorylation of a tyrosine residue. For this enzyme 
only the Library Sub-Set 4 (i.e. X-Tyr-X-X-X according to SEQ ID No. 2) was scanned 
with the enzyme. The active site substrate recognition substrate for this enzyme for this 
Sub-Set was Serine-tyrosine-phenylalanine-histamine-glutamine [SEQ ID No. 21]. 
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Example 5 

Decon volution methodology from W097/42216. 

Libraries or sub-libraries are arranged as two orthogonal sets of mixtures of compounds in 
solution providing two complementary combinatorial libraries indexed in two dimensions for 
autodeconvolution. These are referred to as primary and secondary libraries. 

The general concept of two orthogonal sets of mixtures indexed in two dimensions can be 
applied to various permutations of numbers of wells, plate layout, number of permutations 
per mixture etc. However, according to the invention the numerical interrelationship is 
defined as indicated below for libraries containing compounds with four variable groups B, 
C, D and E. 

General Deconvolution Formulae 
-Bb-Cc-Dd-n(Ee)- (I) 

1) Primary and Secondary plates preferably have the same number of compounds per 
well [X]: otherwise there are two values, having X p and X s respectively. 

2) The primary library comprises [np] plates. 

If Rp x Cp=Rs x Cs, then the number of plates in the secondary library is also [np]. If not, the 

number of plates in the secondary library [ns] is: 

ns = Rp x Cp x np 
Rs x Cs 

e.g., a primary library of np = 4, Rp = 8, Cp = 10 can be set out in an Rs = 4, Cs = 5 
secondary library with the number of plates equal to: 
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ns = 8 x 10 x np 
4x5 

= 16 plates. 

Number of compounds per well 
-Bb-Cc-Dd-np(Ee)- (1) 

Number of possible combinations [k] is given by: 
k=bxcxdxnpxe (2) 

When number of wells on a plate = [N], number of compounds per well = [X] and number of 
plates = [np]. 

k = X x N x np (3) 

However, number of wells [NI is also defined by the number of rows [Rp] and number of 
columns [Cp]. 

N = Rp x Cp (4) 



Combining (3) and (4). 

k = X x Rp x Cp x np (5) 
Combining (2) and (5) 

b x c x d x np x e = X x Rp x Cp x np (6) 

Cancelling [np] from both sides of the equation: 
b x c x d x e = X x Rp x Cp (7) 

Two of the variables (e.g., b and c) on the left side of the equation must each be equal in 
number to the number of columns [Cp], whilst a remaining variable (e.g., d) on the left side 
must be equal in number to the number of rows [Rp]. So: 
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[Cp] 2 x Rp x e = X x Rp x Cp (8) 

Cancelling [Cp] and [Rp] from both sides of the equation. 
Cp x e = X (9) 

where [e] is the number of variants along a fixed row; and if Rp = Cp, then Rp x e = X. 

Example for a 10 x 10 x 8 x 8 format over 4 plates: 
npxe = 8=>e = 2 
10x2 = X 
X = 20. 

From an understanding of the general deconvolution formulae shown above, those skilled in 
the art will readily appreciate that the advantageous results of self-deconvolution according to 
W097/42216 are obtainable utilising a number of different arrangements of wells, plate 
layouts, mixtures etc. 

The technique will be illustrated by reference to a model system for screening a protease with 
a two complementary compound libraries, LI and L2, each contain n x 1600 compounds, of 
the type A-B M0 -C l -io-»-D u8 - r n(E l . 2 ) -F-G [II], in which. 

A = a fluorescor internally quenched by F, preferably an unsubstituted or substituted 
anthranilic acid derivative, connected by-an amide bond to B B, C, D, E, are natural or 
unnatural amino acid residues connected together by suitable bonds, although B, C ; D and E 
can be any set of groups. 

F = a quencher capable of internally quenching the fluorescor A, preferably an unsubstituted 
or substituted 3-nitrotyrosine derivative. 
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G = optionally present and is a hydrophilic moiety, preferably an aspartyl amide moiety. If 
present, G advantageously ensures that all compounds in the library are imparted with 
aqueous solubility. Also, G should not be a substrate for any type of enzyme, 
n = any integer between 1 and 4 inclusive. 

The numbers represented in subscript following residues B, C, D and E refer to the number of 
possibilities from which those residues are selected. Thus, by way of illustrative example, A- 
B1.5 -C-D-E1-2-F-G represents a mixture of the following ten compounds. 

A-B,-C-D-E,-F-G 
A-B 2 -C-D-E,-F-G 
A-B 3 -C-D-E r F-G 
A-B 4 -C-D-E,-F-G 
A B 5 -C-D-E,-F-G 
A-B,-C-D-E 2 -F-G 
A-B 2 -C-D-E 2 -F-G 
A-B 3 -C-D-E 2 -F-G 
A-B4-C-D-E2-F-G 
A B 5 -C-D-E 2 -F-G 

The general combinatorial formula for each library can be expressed as: 

Ai-B,o-Cio-D 8 -n(E2)-Fi-Gi (III) 

providing lx 10 x 10 x 8 x n x 2 x 1x 1 = 1600n is compounds. 

Both compound libraries, LI and L2, of the above type are synthesized using solid phase 
techniques using the Multipin approach" such that each library contains 1600n compounds as 
80n mixtures of 20 distinct, identifiable compounds. These 20 component mixtures are then 
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placed separately into each of 80 wells of a 96 well plate (the other two lanes are used for 
control experiments) and then screened against a known quantity of the protease. 

Thus it is an important that regardless of the number of compounds contained in the two 
libraries LI and L2 (e.g., in the preferred embodiment 1600n, where n = any integer between 
1 and 4) the libraries themselves are complementary and amenable to deconvolution without 
recourse to resynthesis. 

The general description of the library layout will now be described with reference to Figures 
4 to 17 which exemplify component distributions in the plates of a library matrix; 

For example, when n = 1 and the library contains 1600 compounds, in the first column of the 
first row (Al) (Fig. 4) in the first plate (PI) of the library LI, (hereinafter designated as 
location A 1, PI, LI) there will be one C component (d), one D component (D,), the ten B 
components, and the two E components (E, and E 2 ) (Fig. 5). In the tenth column of the first 
row (A10) in the first plate (PI) of the library LI (hereinafter designated as location A 10, PI, 
LI), there will be one C component (Ci 0 ), one D component (D,), the ten B components, and 
the two E components (Ei and E 2 ). In the tenth column of the eighth row (H, 0 ) in the first 
plate (PI) of the library LI (hereinafter designated as location H10, Pi, LI), there will be one 
C component (Cio) one D component (D 8 ), the ten B components, and the two E components 
(Ei and E,). Hence all 1600 components are present in the one plate, because the 80 wells 
each contain 20 components. 

A second complementary library is synthesised as follows (Fig. 6). In the first column of the 
first row (Al) of the first plate (PI) of the library L2 (hereinafter designated as location Al, 
PI, L2), there will be ten C components, two D components (D 3 and D 4 ), one B component 
(B,), and one E component (Ei). In the tenth column of the first row (A10) of the first plate 
(PI) of the library, L2 (hereinafter designated as location A10, PI, L2), there will be ten C 
components, two D components (Di and D 2 ), one B component (B 10 ), and one E component 
(E,). In the first column of the second row (Bl) of the first plate (PI) of the library L2 
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(hereinafter designated as location Bl, PI, L2), there will be ten C components, two D 
components (Di and D 2 ), one B component, B, and one E component, E 2 . In the tenth column 
of the second row (BIO) of the first plate (PI) of the library, L2 (B10.P1.P2) there will be ten 
C components, two D components (D, and D 2 ), one B component, Bi, and one E component, 
E 2 . Hence only the first two rows are used to accommodate 400 compounds in total. 

In the first column of the first row (Al) of the second plate (P2) of the library L2 (hereinafter 
designated as location Al, P2, L2), there will be ten C components, two D components (D 3 
and D 4 ), one B component (BO, and one E component (Ei) (Fig. 7). In the tenth column of 
the first row (A 10) of the second plate (P2) of the library, L2, (hereinafter designated as 
location A10, P2, L2), there will be ten C components, two D components (D 3 and D 4 ), one B 
component (B i0 ), and one E component (E|). In the first column of the second row (Bl) of 
the second plate (P2) of the library L2 (hereinafter designated as location Bl, P2, L2), there 
will be ten C components, two D components (D 3 and D 4 ), one B component (Bi), and one E 
component (E 2 ). In the tenth column of the second row (BIO) of the second plate (P2) of the 
library L2 (BIO, P2, L2), there will be ten C components, two D components (D 3 and D 4 ), 
one B component (B 10 ), and one E component (E 2 ). Hence only the first two rows are used to 
accommodate 400 compounds in total. 

In the first column of the first row (Al) of the third plate (P3) of the library L2 (hereinafter 
designated as location Al, P3, L2), there will be ten C components, two D components (D 5 
and D 6 ), one B component (B,), and one E component (Ei) (Fig. 8). In the tenth column of 
the first row (A10) of the third plate (P3) of the library L2 (hereinafter designated as location 
AIO, P3, L2), there will be ten C components, two D components (D 5 and D 6 ), one B 
component (B )0 ), and one E component (E[). In the first column of the second row (B 1) of 
the third plate (P3) of the library L2 (hereinafter designated as location Bl, P3, L2). there will 
be ten C components, two D components (D 5 and D 6 ), one B component (B|), and one E 
component (E 2 ). In the tenth column of the second row (BIO) of the third plate (P3) of the 
library L2 (BIO, P3, L2), there will be ten C components, two D components (D 5 and D 6 ), 
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one B component (Bio), and one E component (E?). Hence only the first two rows are used to 
accommodate 400 compounds in total. 

In the first column of the first row (Ai) of the fourth plate (P4) of the library L2 (hereinafter 
designated as location Al, P4, L2), there will be ten C components, two D components (D7 
and D 8 ), one B component (BO, and one E component (E0 (Fig. 9). In the tenth column of 
the first row A 10) of the fourth plate (P4) of the library L2 (hereinafter designated as location 
A10, P4, L2), there will be ten C components, two D components (D 7 and D 8 ), one B 
component (Bio), and one E component (E0- In the first column of the second row (Bl) of 
the fourth plate (P4) of the library L2 (hereinafter designated as location is Bl, P4, L2), there 
will be ten C components, two D components (D 7 and D 8 ), one B component (B t ), and one E 
component (E 2 ). In the tenth column of the second row (BIO) of the fourth plate (P4) of the 
library L2 (BIO, P4, L2) there will be ten C components, two D components (D7 and Dg), one 
B component (Bio), and one E component (E2). Hence only the first two rows are used to 
accommodate 400 compounds in total. 

In this fashion two complementary libraries, LI and L2 are prepared. In library LI, each of 
the 80 of wells contains a mixture of 20 components providing 1600 compounds for 
screening. In library L2, four plates are used in which only the first two rows are employed, 
providing 20 wells of 20 components per well per plate, and furnishing the same 1600 
compounds as are present in library LI, but in a format in which no two compounds found 
together in library LI will be found together in library L2. 

Thus it is important that the compounds contained in the two libraries LI and L2 are 
themselves complementary, in that any two compounds which are found together in a 20 
component mixture in the same location (e.g., Al, PI, LI) in library LI, are not found 
together in any of the 20 component mixtures in any location of the library L2. 
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Thus, for example, with reference to the primary library PI LI of Figure 5 and the secondary 
libranes PI L2, P2 L2, P3 L2 and P4 L2 of Figures 6-9 it is possible to decon volute an 
exemplary sequence. 

-B2 -C3 -D4- Ei 

If the library is a FRET library and this sequence is a substrate fluorescence will occur in PI 
LI at C3D4 This gives the information that the substrate is: 

?-C 3 -D 4 -? 

If fluorescence occurs in P2 L2 at B2E1, it indicates a substrate: 

-B 2 -?-?-E r 

The confirmation of the substrate as: 

-B2-C3-D4-E1 

should be provided by non-fluorescence of PI L2, P3 L2 and P4 L2 which all contain -B 2 -C 3 - 
X-Ei- where X is not D. 

In practice it is likely that more than one sequence will result in a substrate. Information as to 
which positions B-C-D-E- are sensitive to change (i.e., require a specific group) and which 
are insensitive (i.e., can tolerate more than one choice of group) in the context of the whole 
sequence gives valuable S AR data which can be used to model and/or synthesise related 
compounds. 



In analogous examples, where separately n = 2, 3 or 4, extra plates are constructed in library 
LI format to accommodate the component pairs E 3 and E 4 (n = 2), E 5 and E 6 (n = 3), and E 7 
and E 8 (n = 4), respectively. For the respective deconvolution libraries of the type L2, the 
respective rows in the plates PI, P2, P3, and P4, are increasingly filled with the paired 
components Di and D 2 , D 3 and D 4 , and D 5 and D 6 , and D 7 and D 8 , respectively. 
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For example, when n = 3, and the library contains 4800 compounds, in the first column of the 
first row (Al) in the first plate (PI) of the library LI (hereinafter designated as location- Al, 
PI, LI) there will be one C component (Ci), one D component (DO, the ten B components, 
and the two E components (Ei and E2). In the tenth column of the first row (A 10) in the first 
plate (PI) of the library LI (hereinafter designated as location A10, PI, LI) there will be one 
C component (C10), one D component (Dt), the ten B components, and the two E components 
(Ei and E 2 ). In the tenth column of the eighth row (H10) in the first plate (PI) of the library 
LI (hereinafter designated as location H10, PI, LI) there will be one C component (C10), one 
D component (Dg), the ten B components, and the two E components (Ei and E2). Hence 
1600 components are present in the one plate, because the 80 wells each contain 20 
components. 

In the first column of the first row (Al) in the second plate (P2) of the library LI (hereinafter 
designated as location Al, P2, LI) there will be one C component (Ci), one D component 
(DO, the ten B components, and the two E components (E3 and E4). In the tenth column of 
the first row (A 10) in the second plate (P2) of the library LI (hereinafter designated as 
location A10, P2, LI) there will be one C component (Cjo), one D component (DO, the ten B 
components, and the two E components (E 3 and E 4 ). In the tenth column of the eighth row 
(H10) in the second plate (P2) of the library LI (hereinafter designated as location H10, PI, 
LI) there will be one C component (C10), one D component (D 8 ), the ten B components, and 
the two E components (E3 and E4). Hence 1600 components are present in the one plate, 
because the 80 wells each contain 20 components. 

In the first column of the first row (Al) in the third plate (P3) of the library LI (hereinafter 
designated as location Al, P3, LI), there will be one C component (CO, one D component 
(DO, the ten B components, and the two E components (E5 and E6). In the tenth column of 
the first row (A10) in the third plate (P3) of the library LI (hereinafter designated as location 
A 10, P3, LI) there will be one C component (C10), one D component (DO, the ten B 
components, and the two E components (E5 and E6). In the tenth column of the eighth row 
(H10) in the third plate (P3) of the library LI (hereinafter designated as location H10, P3, LI) 
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there will be one C component (Cio), one C component (C 8 ), the ten B components, and the 
two E components (E 5 and E 6 ). Hence 1600 components are present in the one plate, because 
the 80 wells each contain 20 components. In total the three plates, PI, P2 and P3, contain 
1600 compounds/plate 4800 compounds in total. 

For example, when n = 4, and the library contains 6400 compounds, in the first column of the 
first row (Al) in the first plate (PI) of the library LI (hereinafter designated as location Al, 
PI, LI) there will be one C component (CO, one D component (DO, the ten B components, 
and the two E components (Ei and E 2 ) (Fig. 10). In the tenth column of the first row (A 10) in 
the first plate (PI) of the library LI (hereinafter designated as location A10, PI, LI) there will 
be one C component (Cio), one D component (DO, the ten B components, and the two E 
components (Ei and E 2 ). In the tenth column of the eighth row (H10) in the first plate (PI) of 
the library LI (hereinafter designated as location H10, PI, LI) there will be one C component 
(Cio), one D component (D 8 ), the ten B components, and the two E components (Ei and E 2 ). 
Hence all 1600 components are present in the one plate, because the 80 wells each contain 20 
components. 

In the first column of the first row (Al) in the second plate (P2) of the library LI (hereinafter 
designated as s location Al, P2, LI) there will be one C component (CO, one D component 
(DO, the ten E components, and the two E components (E3 and E 4 ) (Fig. 1 1). In the tenth 
column of the first row (A10) in the second plate (P2) of the library LI (hereinafter 
designated as location A10, P2, LI) there will be one C component (Cio), one D component 
(DO, the ten B components, and the two E components (E3 and E4). In the tenth column of 
the eighth row (H10) in the second plate (P2) of the library LI (hereinafter designated as 
location H10, P2, LI) there will be one C component (Cio), one D component (D8), the ten B 
components, and the two E components (E 3 and E 4 ). 

In the first column of the first row (Al) in the third plate (P3) of the library LI (hereinafter 
designated as location Al, P3, LI) there will be one C component (CO, one D component 
(DO, the ten B components, and the two E components (E 5 and E6) (Fig. 12). In the tenth 
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column of the first row (A10) in the third plate (P3) of the library LI (hereinafter designated 
as location A10, P3, LI) there will be one C component (Cio), one D component (Di), the ten 
B components, and the two E components (E 5 and E 6 ). In the tenth column of the eighth row 
(H10) in the third plate (P3) of the library LI (hereinafter designated as location H10, P3, LI) 
there will be one C component (Ci 0 ), one D component (D 8 ), the ten B components, and the 
two E components (E5 and E6). 

In the first column of the first row (Al in the fourth plate (P4) of the library LI (hereinafter 
designated as location Al, P4, LI) there will be one C component (Ci), one D component 
(P\), the ten B components, and the two E components (E7 and E 8 ) (Fig. 13). Likewise, in the 
tenth column of the first row (A10) in the fourth plate (P4) of the library LI (hereinafter 
designated as location A10, P4, LI) there will be one C component (Cio), one D component 
(Dj), the ten B components, and the two E components (E7 and Eg). In the tenth column of 
the eighth row (H10) in the fourth plate (P4) of the library LI (hereinafter designated as 
location H10, P4, LI) there will be one C component (Cio), one D component (D 8 ), the ten B 
components, and the two E components (E 7 and Eg). 

A second complementary library is synthesised as follows. In the first column of the first row 
(Al) of the first plate (PI) of the library L2 (hereinafter designated as location Al, PI, L2), 
there will be ten C components, two D components (Di and D2), one B component (Bi), and 
one E component (Ei) (Fig. 14). In the tenth column of the first row (A10) of the first plate 
(PI) of the library L2 (hereinafter designated as location A10, PI, L2), there will be the ten C 
components, two D components (Di and D2), one B component (Bio), and one E component 
(Ei). In the first column of the eighth row (HI) of the first plate (PI) of the library L2 
(hereinafter designated as location HI, PI, L2), there will be the ten C components, two D 
components (Dj and D 2 ), one B component (Bi), and one E component (Eg). In the tenth 
column of the eighth row (H10) of the first plate (PI) of the library L2 (hereinafter designated 
as location H10, PI, L2) there will be the ten C components, two D components (Di and D 2 ), 
one B component (Bio), and one E component (E 8 ). Hence the matrix containing all ten 
columns and all eight rows are used to accommodate 1600 compounds in total. 
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In the first column of the first row (Al) of the second plate (P2) of the library L2 (hereinafter 
designated as location Al, P2, L2), there will be ten C components, two D components (D 3 
and D 4 ), one B component (Bi), and one E component (Ej) (Fig. 15). In the tenth column of 
the first row (A 10) of the second plate (P2) of the library L2 (hereinafter designated as 
location A10, P2, L2), there will be ten C components, two D components (D 3 and D 4 ), one B 
component (Bio), and one E component (Ei). In the first column of the second row (Bl) of 
the second plate (P2) of the library L2 (hereinafter designated as location Bl, P2, L2), there 
will be ten C components, two D components (D 3 and D 4 ), one B component (BO, and one E 
component (E 2 ). In the tenth column of the eighth row (H10) of the second plate (P2) of the 
library L2 (hereinafter designated as location H10, P2, L2), there will be ten C components, 
two D components (D 3 and D 4 ), one B component (Bio), and one E component (Eg). 

In the first column of the first row (Al) of the third plate (P3) of the library L2 (hereinafter 
designated as location Al, P3, L2), there will be ten C components, two D components (D5 
and D 6 ), one B component (Bt) and one E component (Ej) (Fig. 16). In the tenth column of 
the first row (A10) of the third plate (P3) of the library L2 (hereinafter designated as location 
A10, P3, L2), there will be ten C components, two D components (D 5 , and D 6 ), one B 
component (Bio), and one E component (E\). In the first column of the second row (Bl) of 
the third plate (P3) of the library L2 (hereinafter designated as location Bl, P3, L2), there will 
be ten C components, two D components (D 5 and D 6 ), one B component (Bi), and one E 
component (E 2 ). In the tenth column of the eighth row (H10) of the third plate (P3) of the 
library L2 (hereinafter designated as location H10, P3, L2), there will be ten C components, 
two D components (D 5 and D 6 ), one B component (Bio), and one E component (Eg). 

In the first column of the first row (Al) of the fourth plate (P4) of the library L2 (hereinafter 
designated as location Al, P4, L2), there will be ten C components, two D components (D 7 
and Dg), one B component (Bi), and one E component (Ei) (Fig. 17). In the tenth column of 
the first row (A10) of the fourth plate (P4) of the library L2 (hereinafter designated as 
location A10, P4, L2), there will be ten C components, two D components (D 7 and Dg), one B 
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component (Bio), and one E component (Ei). In the first column of the second row (Bl) of 
the fourth plate (P4) of the library L2 (hereinafter designated as location B 1, P4, L2), there 
will be ten C components, two D components (D7 and D 8 ), one B component (BO, and one E 
component (E2). In the tenth column of the eighth row (H10) of the fourth plate (P4) of the 
library L2 (hereinafter designated as location H10, P4, L2), there will be ten C components, 
two D components (D 7 and D 8 ), one B component (B 10 ), and one E component (Eg). 

The strategy is thus based on the synthesis of two orthogonal sets of mixtures in solution. 
These solutions are each indexed in two dimensions. Thus the data from a scan identifies the 
most active compounds without the need for decoding or resynthesis. 

The positional preferences of sub-units (in this case amino acids) are optimised with respect 
to all other variant positions simultaneously. The synergistic relationship between all four 
positions is realised and both positive, beneficial and negative, deactivating data are 
generated. This leads to families (sub-populations) cf. substrates and their sub-unit 
preferences. The data can be fed into molecular modelling programs to generate 
pharmacophore descriptors that encompass both the desirable features (from the positive 
data) and indicate undesirable interactions (from the negative data sets). 

Note that a one dimensional scan only indicates one position at a time as 'most active' and 
does not explore the synergistic relationship between positions. 

The general methodology exemplified above with regard to the use of complementary 
combinatorial FRET libraries for the identification of proteolytic enzyme substrates, is 
equally applicable for identification of compounds from a library which interact with another 
active moiety. 

Combinatorial libraries of compounds containing four variable groups B, C, D and E can be 
produced an interactions with active moieties detected using suitable reporters or markers. 
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